58 research outputs found

    Enhanced clustering analysis pipeline for performance analysis of parallel applications

    Get PDF
    Clustering analysis is widely used to stratify data in the same cluster when they are similar according to the specific metrics. We can use the cluster analysis to group the CPU burst of a parallel application, and the regions on each process in-between communication calls or calls to the parallel runtime. The resulting clusters obtained are the different computational trends or phases that appear in the application. These clusters are useful to understand the behavior of the computation part of the application and focus the analyses on those that present performance issues. Although density-based clustering algorithms are a powerful and efficient tool to summarize this type of information, their traditional user-guided clustering methodology has many shortcomings and deficiencies in dealing with the complexity of data, the diversity of data structures, high-dimensionality of data, and the dramatic increase in the amount of data. Consequently, the majority of DBSCAN-like algorithms have weaknesses to handle high-dimensionality and/or Multi-density data, and they are sensitive to their hyper-parameter configuration. Furthermore, extracting insight from the obtained clusters is an intuitive and manual task. To mitigate these weaknesses, we have proposed a new unified approach to replace the user-guided clustering with an automated clustering analysis pipeline, called Enhanced Cluster Identification and Interpretation (ECII) pipeline. To build the pipeline, we propose novel techniques including Robust Independent Feature Selection, Feature Space Curvature Map, Organization Component Analysis, and hyper-parameters tuning to feature selection, density homogenization, cluster interpretation, and model selection which are the main components of our machine learning pipeline. This thesis contributes four new techniques to the Machine Learning field with a particular use case in Performance Analytics field. The first contribution is a novel unsupervised approach for feature selection on noisy data, called Robust Independent Feature Selection (RIFS). Specifically, we choose a feature subset that contains most of the underlying information, using the same criteria as the Independent component analysis. Simultaneously, the noise is separated as an independent component. The second contribution of the thesis is a parametric multilinear transformation method to homogenize cluster densities while preserving the topological structure of the dataset, called Feature Space Curvature Map (FSCM). We present a new Gravitational Self-organizing Map to model the feature space curvature by plugging the concepts of gravity and fabric of space into the Self-organizing Map algorithm to mathematically describe the density structure of the data. To homogenize the cluster density, we introduce a novel mapping mechanism to project the data from the non-Euclidean curved space to a new Euclidean flat space. The third contribution is a novel topological-based method to study potentially complex high-dimensional categorized data by quantifying their shapes and extracting fine-grain insights from them to interpret the clustering result. We introduce our Organization Component Analysis (OCA) method for the automatic arbitrary cluster-shape study without an assumption about the data distribution. Finally, to tune the DBSCAN hyper-parameters, we propose a new tuning mechanism by combining techniques from machine learning and optimization domains, and we embed it in the ECII pipeline. Using this cluster analysis pipeline with the CPU burst data of a parallel application, we provide the developer/analyst with a high-quality SPMD computation structure detection with the added value that reflects the fine grain of the computation regions.El análisis de conglomerados se usa ampliamente para estratificar datos en el mismo conglomerado cuando son similares según las métricas específicas. Nosotros puede usar el análisis de clúster para agrupar la ráfaga de CPU de una aplicación paralela y las regiones en cada proceso intermedio llamadas de comunicación o llamadas al tiempo de ejecución paralelo. Los clusters resultantes obtenidos son las diferentes tendencias computacionales o fases que aparecen en la solicitud. Estos clusters son útiles para entender el comportamiento de la parte de computación del aplicación y centrar los análisis en aquellos que presenten problemas de rendimiento. Aunque los algoritmos de agrupamiento basados en la densidad son una herramienta poderosa y eficiente para resumir este tipo de información, su La metodología tradicional de agrupación en clústeres guiada por el usuario tiene muchas deficiencias y deficiencias al tratar con la complejidad de los datos, la diversidad de estructuras de datos, la alta dimensionalidad de los datos y el aumento dramático en la cantidad de datos. En consecuencia, el La mayoría de los algoritmos similares a DBSCAN tienen debilidades para manejar datos de alta dimensionalidad y/o densidad múltiple, y son sensibles a su configuración de hiperparámetros. Además, extraer información de los clústeres obtenidos es una forma intuitiva y tarea manual Para mitigar estas debilidades, hemos propuesto un nuevo enfoque unificado para reemplazar el agrupamiento guiado por el usuario con un canalización de análisis de agrupamiento automatizado, llamada canalización de identificación e interpretación de clúster mejorada (ECII). para construir el tubería, proponemos técnicas novedosas que incluyen la selección robusta de características independientes, el mapa de curvatura del espacio de características, Análisis de componentes de la organización y ajuste de hiperparámetros para la selección de características, homogeneización de densidad, agrupación interpretación y selección de modelos, que son los componentes principales de nuestra canalización de aprendizaje automático. Esta tesis aporta cuatro nuevas técnicas al campo de Machine Learning con un caso de uso particular en el campo de Performance Analytics. La primera contribución es un enfoque novedoso no supervisado para la selección de características en datos ruidosos, llamado Robust Independent Feature. Selección (RIFS).Específicamente, elegimos un subconjunto de funciones que contiene la mayor parte de la información subyacente, utilizando el mismo criterios como el análisis de componentes independientes. Simultáneamente, el ruido se separa como un componente independiente. La segunda contribución de la tesis es un método de transformación multilineal paramétrica para homogeneizar densidades de clústeres mientras preservando la estructura topológica del conjunto de datos, llamado Mapa de Curvatura del Espacio de Características (FSCM). Presentamos un nuevo Gravitacional Mapa autoorganizado para modelar la curvatura del espacio característico conectando los conceptos de gravedad y estructura del espacio en el Algoritmo de mapa autoorganizado para describir matemáticamente la estructura de densidad de los datos. Para homogeneizar la densidad del racimo, introducimos un mecanismo de mapeo novedoso para proyectar los datos del espacio curvo no euclidiano a un nuevo plano euclidiano espacio. La tercera contribución es un nuevo método basado en topología para estudiar datos categorizados de alta dimensión potencialmente complejos mediante cuantificando sus formas y extrayendo información detallada de ellas para interpretar el resultado de la agrupación. presentamos nuestro Método de análisis de componentes de organización (OCA) para el estudio automático de forma arbitraria de conglomerados sin una suposición sobre el distribución de datos.Postprint (published version

    Locating Emergency Facilities Using the Weighted k-median Problem: A Graph-metaheuristic Approach

    Get PDF
    An efficient approach is presented for addressing the problem of finding the optimal facilities location in conjunction with the k-median method. First the region to be investigated is meshed and an incidence graph is constructed to obtain connectivity properties of meshes. Then shortest route trees (SRTs) are rooted from nodes of the generated graph. Subsequently, in order to divide the nodes of graph or the studied region into optimal k subregions, k-median approach is utilized. The weights of the nodes are considered as the risk factors such as population, seismic and topographic conditions for locating facilities in the high-risk zones to better facilitation. For finding the optimal facility locations, a recently developed meta-heuristic algorithm that is called Colliding Bodies Optimization (CBO) is used. The performance of the proposed method is investigated through different alternatives for minimizing the cost of the weighted k-median problem. As a case study, the Mazandaran province in Iran is considered and the above graph-metaheuristic approach is utilized for locating the facilities

    Optimal Design of the Monopole Structures Using the CBO and ECBO Algorithms

    Get PDF
    Tubular steel monopole structure is widely used for supporting antennas in telecommunication industries. This research presents two recently developed meta-heuristic algorithms, which are called Colliding Bodies Optimization (CBO) and Enhanced Colliding Bodies Optimization (ECBO), for size optimization of monopole steel structures. The design procedure aims to obtain minimum weight of monopole structures subjected to the TIA-EIA222F specification. Two monopole structure examples are examined to verify the suitability of the design procedure and to demonstrate the effectiveness and robustness of the CBO and ECBO in creating optimal design for this problem. The outcomes of the enhanced colliding bodies optimization (ECBO) are also compared to those of the standard colliding bodies optimization (CBO) to illustrate the importance of the enhancement of the CBO algorithm

    Feature space curvature map: A method to homogenize cluster densities

    Get PDF
    The majority of density-based clustering algorithms can not perform properly when data expose very different density through the feature space. These algorithms implicitly presume that all clusters almost have the same density, therefore, they normally use global parameters. Consequently, they are often biased towards finding dense clusters in front of sparse ones. In this paper, we propose a parametric multilinear transformation method to homogenize cluster densities while preserving the topological structure of the dataset. The transformed clusters have approximately the same density while all inter-cluster regions become globally low-density. In our method, the feature space is locally bent by dense data point concentrations the same way as stars bend the space-time dimensions in Theory of Relativity. We present a new Gravitational Self-organization Map to model the feature space curvature by plugging the concepts of gravity and fabric of space into the Self-organization Map algorithm to mathematically describe the density structure of the data. To homogenize the cluster density, we introduce a novel mapping mechanism to project the data from a non-Euclidean curved space to a new Euclidean flat space. Specifically, this mechanism transfers the basis vectors instead of the feature vectors to guarantee the continuity of the mapping function and optimize the computation cost of the algorithm. As a result, our method can efficiently and explicitly homogenize the density of any dataset globally to then apply existing clustering algorithms without modification. Our experimental results over both real-world and synthetic datasets show that our approach outperforms the current statistical-based methods.Peer ReviewedPostprint (author's final draft

    Optimal design of structures with multiple natural frequency constraints using a hybridized BB-BC/Quasi-Newton algorithm

    Get PDF
    A hybridization of the Quasi-Newton with Big Bang-Big Crunch (QN-BBBC) optimization algorithm is proposed to find the optimal weight of the structures subjected to multiple natural frequency constraints. The algorithm is based on hybridizing a mathematical algorithm (quasi-Newton) for local search and a meta-heuristic algorithm (Big Bang-Big Crunch) for global search, and to help to leave the traps. Four examples are proposed for the optimization of trusses and two examples are studied for the optimization of frames with frequency constraints. The examples are widely reported and used in the related literature as benchmarks. The numerical results reveal the robustness and high performance of the suggested methods for the structural optimization with frequency constraints

    Improving clustering analysis of performance data

    No full text

    Improving clustering analysis of performance data

    No full text

    Improving clustering analysis of performance data

    No full text
    corecore